Robinia: Scalable Framework for Data-intensive Scientific Computing on Wide Area Network
نویسندگان
چکیده
With the continuously growing data from scientific devices and models, data exploration becomes one of four kinds of scientific research paradigms. It leads to faster, larger-scale and more complex processing requirements, and parallelism is being more and more important for scientific data analyzing applications. But, because of troubles such as unstable wide-area network and heterogeneity among computing platforms, it is difficult to create scalable parallel scientific applications, especially wide-area parallel applications which have to process big data from geographically distributed research institutes to enable complex data analysis for ”great challenge problems”. In this paper, a data intensive computing framework named Robinia is proposed for exploiting parallelism among processing nodes over wide area network for data-intensive analysis on scientific big data. Robinia integrates distributed resources such as scientific data, processing algorithms, and storage services by a platform-independent framework; provides a unified execution environment for wide-area network based distributed spatial applications; and helps them exploit parallelism by a well-defined web-based programming interface. Experiments on prototype system and demo applications show that scientific analysis applications based on Robinia can achieve higher performance and better scalability by analyzing distributive stored big data over wide-area network such as Internet simultaneously.
منابع مشابه
Data Replication-Based Scheduling in Cloud Computing Environment
Abstract— High-performance computing and vast storage are two key factors required for executing data-intensive applications. In comparison with traditional distributed systems like data grid, cloud computing provides these factors in a more affordable, scalable and elastic platform. Furthermore, accessing data files is critical for performing such applications. Sometimes accessing data becomes...
متن کاملBiomolecular committor probability calculation enabled by processing in network storage
Computationally complex and data intensive atomic scale biomolecular simulation is enabled via processing in network storage (PINS): a novel distributed system framework to overcome bandwidth, compute, storage, organizational, and security challenges inherent to the wide-area computation and storage grid. PINS is presented as an effective and scalable scientific simulation framework to meet the...
متن کاملScalable Bulk Data Transfer in Wide Area Networks
Bulk data transfer in wide area networks (WAN) requires scalable and high network bandwidth. In this paper, we identify a number of the scalability limitations that affect the full utilization of peak theoretical network bandwidth. In addition, we study and classify different offered approaches to overcome some of the identified limitations and increase network bandwidth among Grid components i...
متن کاملSimulation of Terabit Data Flows for Exascale Applications
Scientific workflows are increasingly drawing attention as both data and compute resources are getting bigger, heterogeneous, and distributed. Many science workflows are both compute and data intensive and use distributed resources. This situation poses significant challenges in terms of real-time remote analysis and dissemination of massive datasets to scientists across the community. These ch...
متن کاملA New Framework for Increasing the Sustainability of Infrastructure Measurement of Smart Grid
Advanced Metering Infrastructure (AMI) is one of the most significant applications of the Smart Grid. It is used to measure, collect, and analyze data on power consumption. In the AMI network, the smart meters traffics are aggregated in the intermediate aggregators and forwarded to the Meter Data Management System (MDMS). The infrastructure used in this network should be reliable, real-time an...
متن کامل